The Electricity Consumption and Occupancy (ECO) dataset is a publicly available dataset that contains information about the energy consumption and occupancy of a commercial building over a period of time. The dataset was created to support research on building energy management and sustainability.
The ECO dataset includes data on the electricity consumption of various systems and devices within the building, such as lighting, HVAC, and plug loads. The data is collected at a high frequency, typically every second of the day. The dataset also includes information on the occupancy of the building at different times of the day, which is typically measured using occupancy sensors or manually recorded data.
The ECO dataset is available in several different formats, including raw data files and pre-processed data files. The raw data files contain the original data collected from the building’s meters and sensors, while the pre-processed data files contain aggregated data that has been cleaned and normalized for easier analysis. The pre-processed data files may also include additional information, such as weather data, to help contextualize the energy consumption and occupancy data.
The ECO dataset has been used in a variety of research studies related to building energy management and sustainability. For example, the dataset has been used to develop and test algorithms for energy management systems, to analyze patterns in energy consumption and occupancy, and to evaluate the effectiveness of energy-saving interventions. The dataset is a valuable resource for researchers and practitioners in the field of building energy management and sustainability, as it provides a detailed and comprehensive view of energy consumption and occupancy in a real-world commercial building.
For the purpose of this Homework, I have particularly focused on Household 5. Firstly I look at the overall smart meter data for the household from 27.06.12 to 31.01.13. Then, I dive deeper into the dataset by taking 1 day and see how the energy consumption varies throughout the day of the different appliances in the household.
Data Science Questions
What is the average total power consumption and average power consumption for phases 1, 2, and 3 for Household 5 from 27.06.12 to 31.01.13, based on smart meter data?
What is the energy consumption over time for each appliance in a household, and how does the consumption of each appliance compare to the others?
Code
import pandas as pdimport numpy as npfrom pathlib import Pathimport altair as altimport datetimeimport plotly.graph_objects as goimport plotly.subplots as spfrom functools importreducealt.data_transformers.enable('default', max_rows=None)
DataTransformerRegistry.enable('default')
Data Preparation and EDA for Question 1
Code
# Path of the data filespath =r'/Users/aanchaldusija/Downloads/hw4-spring-2023-aanchal-dusija/eco/05_sm'# Get the files from the path provided in the OPfiles = Path(path).glob('*.csv') # .rglob to get subdirectories
Code
# concatenate all the files into one dataframe and add a new column with the filenamedf5 = pd.concat((pd.read_csv(f, header=None).assign(filename=f.name) for f in files), ignore_index=True)
Code
# create average of every column based on the filename and show filename column# this created the average of all the days for each columndaywisedf5 = df5.groupby('filename').mean().reset_index()
Code
# remove .csv from filename and convert to datetimedaywisedf5['filename'] = daywisedf5['filename'].str.replace('.csv', '')daywisedf5['filename'] = pd.to_datetime(daywisedf5['filename'], format='%Y-%m-%d')
/var/folders/q0/hps99sh511n627gdy32s4wlh0000gn/T/ipykernel_27448/3804425420.py:2: FutureWarning:
The default value of regex will change from True to False in a future version.
/var/folders/q0/hps99sh511n627gdy32s4wlh0000gn/T/ipykernel_27448/2487038129.py:2: FutureWarning:
The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
From the correlation matrix, we can observe that the power consumption of the different phases are highly correlated. This is expected as the power consumption of the different phases should be similar. The power consumption of the different phases are highly correlated with the total power consumption. This is also expected as the total power consumption is the sum of the power consumption of the different phases. The power consumption of the different phases are also highly correlated with each other. This is expected as the power consumption of the different phases should be similar.
Data Preparation for Question 2
Code
import pandas as pddates = ["2012-09-27"]data_plugs = {}for date in dates: data_list = []for i inrange(1, 9):if i !=3: # Skipping the third plug as it's missing in the original code file_path =f"~/Downloads/hw4-spring-2023-aanchal-dusija/eco/05_plugs/0{i}/{date}.csv" data = pd.read_csv(file_path, header=None) data_list.append(data) data_merged = pd.concat(data_list, axis=1) data_merged.columns = ["Tablet", "CoffeeMachine", "Microwave", "Fridge", "Entertainment", "PC", "Kettle"] data_plugs[date] = data_merged data_plugs = data_plugs[date]
/var/folders/q0/hps99sh511n627gdy32s4wlh0000gn/T/ipykernel_27448/890646115.py:2: FutureWarning:
The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
From the correlation matrix, we can observe that the power consumption of the different appliances are not correlated. This is expected as the power consumption of the different appliances should not be similar. Every appliance is used at different time periods of the day.
Results
Plot 1 - What is the average total power consumption and average power consumption for phases 1, 2, and 3 for Household 5 from 27.06.12 to 31.01.13, based on smart meter data?
The code generates a linked chart to display the average total power consumption and average power consumption for phases 1, 2, and 3 for Household 5 from 27.06.12 to 31.01.13, based on smart meter data.
The chart consists of two parts: a bar chart showing the average total power consumption per day for Household 5, and a grouped bar chart showing the average power consumption per day for each of the three power phases (l1, l2, and l3) for Household 5. The two charts are linked by a brush selection for the date variable, which allows the user to select a specific time range of interest and see the corresponding data in both charts.
The bar chart shows the average total power consumption per day for Household 5, with the x-axis representing the date range from 27.06.12 to 31.01.13 and the y-axis representing the average energy consumption per day in kWh. The color of the bars changes when a date range is selected using the brush selection, and the tooltip displays the date and the corresponding energy consumption.
The grouped bar chart shows the average power consumption per day for each of the three power phases (l1, l2, and l3) for Household 5, with the x-axis representing the same date range and the y-axis representing the average energy consumption per day in kWh. The chart is grouped by power phases and each group is represented by a different color. When a date range is selected using the brush selection, the color of the bars changes accordingly, and the tooltip displays the date and the corresponding energy consumption for each power phase.
Code
# AVERAGE TOTAL POWER AND POWER PHASES 1,2,3 FROM 27.06.12 to 31.01.13 FROM SMART METER DATA# Create brush selection for the Date variablebrush = alt.selection_single(fields=['Date'], name='brush')# First bar chart with powerallphasesbar = alt.Chart(daywisedf5).mark_bar().encode( x=alt.X('Date:T', axis=alt.Axis(title='Date', format='%b %y', tickCount=alt.TickCount(interval='month', step=1))), y=alt.Y('powerallphases:Q', axis=alt.Axis(title='Average Energy Per Day (kwh)')), tooltip=['Date', 'powerallphases'], color=alt.condition(brush, alt.value('black'), alt.value('black')), opacity=alt.condition(brush, alt.value(1), alt.value(0.2))).interactive().add_selection( brush).properties( height=200, width=400)# Grouped bar chart with powerl1, powerl2, and powerl3grouped_bar_chart = alt.Chart(daywisedf5).mark_bar().encode( x=alt.X('Date:T', axis=alt.Axis(title='Date', format='%b %y', tickCount=alt.TickCount(interval='month', step=1))), y=alt.Y('value:Q', axis=alt.Axis(title='Average Energy Per Day (kwh)')), color='variable:N', column=alt.Column('variable:N', title='Power Phases'), tooltip=['Date', 'value:Q'], opacity=alt.condition(brush, alt.value(1), alt.value(0.2))).transform_fold( ['powerl1', 'powerl2', 'powerl3'], as_=['variable', 'value']).interactive().add_selection( brush)# make plot smaller for linked_chartslinked_charts = alt.vconcat( bar, grouped_bar_chart, center=True).configure_title( fontSize=18, anchor='middle').properties( title=alt.TitleParams(text='HOUSEHOLD 5: AVERAGE ENERGY CONSUMPTION FROM 27.06.12 to 31.01.13 SMART METER DATA', anchor='middle', offset=20)).configure_view( height=100, width=250)linked_charts
Rationale for design decisions:
The graoh visualises
Visual encodings: The choice of bar charts for visualizing average power consumption was made to provide an easy-to-understand and clear representation of the data. Bar charts are effective in showcasing differences in values across categories or over time. In this case, the height of the bars represents the average power consumption, making it simple to compare values across dates or between power phases.
Interaction: A brush selection was incorporated into the design to enable users to easily explore and focus on specific time periods. By selecting a range on the bar chart, the corresponding data points in the grouped bar chart will be highlighted, allowing users to examine the power consumption for phases 1, 2, and 3 in more detail. This interactive feature improves the user experience and helps users gain a better understanding of the data.
Animation: The opacity of the bars changes upon brush selection, emphasizing the chosen data points and fading out the others. This visual cue guides the user’s attention to the selected data points and provides a smooth transition between different time periods.
Alternative considerations:
Line chart: A line chart could have been used instead of a bar chart to represent the average power consumption over time. Line charts are useful for showing trends over time. However, bar charts were chosen for their simplicity and ease of comparison between individual data points.
Stacked bar chart: A stacked bar chart could have been used to display the power consumption for phases 1, 2, and 3 in a single chart. However, a grouped bar chart was chosen to make it easier for users to compare the power consumption across phases, as the individual bars representing each phase are clearly visible and not overlapped.
The ultimate choices were made based on the goal of providing an intuitive, easy-to-understand, and interactive visualization that allows users to explore and compare the average power consumption for each phase in a household. The selected visual encodings, interaction techniques, and animations work together to achieve this goal effectively.
Plot 2 - What is the energy consumption over time for each appliance in a household, and how does the consumption of each appliance compare to the others?
The code generates a graph showing the energy consumption over time for each appliance in Household 5 on 27.06.12, as well as a dropdown menu allowing the user to compare the energy consumption of each appliance to the others.
The graph is created using the Plotly library in Python. Each appliance in the household is represented as a line, with the x-axis representing time and the y-axis representing energy consumption in watts. The graph shows how the energy consumption of each appliance varies over time, allowing the user to identify patterns or trends in energy usage.
The dropdown menu allows the user to select which appliances to display on the graph. The default setting is “All,” which displays all of the appliances in the household. However, the user can also select individual appliances to compare their energy consumption to the others.
The title of the graph is “Energy Consumption vs Time for Household 5 on 27.06.12,” and the x-axis is labeled “Time” while the y-axis is labeled “Consumption (Watts).” The dropdown menu is included below the graph and allows the user to easily compare the energy consumption of each appliance to the others.
Code
# Energy Consumption vs Time for Household 5 for 27.06.12 fig = go.Figure()# Add a trace for each columnfor col in data_plugs.columns[1:]: fig.add_trace(go.Scatter(x=data_plugs['time_var'], y=data_plugs[col], mode='lines', name=col, visible=True))# Create the list of buttons for the dropdown menubuttons = [dict(label='All', method='update', args=[{'visible': [Truefor col in data_plugs.columns[1:]]}])]for col in data_plugs.columns[1:]: buttons.append(dict(label=col, method='update', args=[{'visible': [Trueif trace_name == col elseFalsefor trace_name in data_plugs.columns[1:]]}]))# Add the updatemenu with the dropdown options to the layoutfig.update_layout( title="Energy Consumption vs Time for Household 5 on 27.06.12", xaxis_title="Time", yaxis_title="Consumption (Watts))", updatemenus=[dict(active=0, buttons=buttons)])fig.show(renderer='notebook')